Pokemon classification with a Support Vector Machine

BSHT Michielsen MSc

This notebook demonstrates how to use a Support Vector Machine (SVM) for image classification. Image recognition is the ability for the computer to identify an object in the image based on the visual characteristics of that object. This is a classification problem, where each possible object is a class, and the provided image should lead to 1 specific class with a as high as feasible certainty. In order to train a classification model with this, a large number of images of the same object are needed. Relative to this notebook there should be a folder named data in which several Pokemon images are found. These images are a subset of the Pokemon collection by Lance Zhang which were picked for the fact that the selected Pokemon have strikingly different colors and therefore the machine can hopefully distinguish them fairly well. More images for the same Pokemon or even different Pokemon can be downloaded and added to the data folder.

First, the versions of the required libraries are shown. It always wise to report the versions of the libraries used so that in case problems arise in the future, one can still go back to a state in which the notebook worked.

In [ ]:
import copy, pathlib, math
import PIL.Image as Image

import sklearn
import numpy
import matplotlib
import matplotlib.pyplot as plt

print("scikit-learn version:", sklearn.__version__)     # 1.1.3
print("numpy version:", numpy.__version__)              # 1.23.4
print("matplotlib version:", matplotlib.__version__)    # 3.6.2
scikit-learn version: 1.3.0
numpy version: 1.24.4
matplotlib version: 3.7.2

📦 Data provisioning¶

In real life the data provisioning phase is likely to include more steps about data sourcing and data quality, however for demo purposes in this notebook it is restricted to merely loading the images from the data folder, without any concern over quantity nor quality.

The code below will load the images and understand that the subfolder names are the class labels. It is important that all the images are the same size (and in this case square as well) so this code will automatically resize them. If high resolution images are available the size parameter can be increased and it will probably improve the performance slightly, at significantly increased training time. The given size of 256 is a middle way which is supposed to give fair results at a reasonable training time.

In [ ]:
size = 256

def load_image(file, size):
    img = Image.open(file)
    img = img.resize((size, size))
    return numpy.array(img).flatten()

def load_labelled_images(path, size):
    labels = list()
    files = list()
    for file_info in [x for x in pathlib.Path(path).glob("**/*.jpeg")]:
        labels.append(file_info.parts[1])
        files.append(str(file_info))
    imgs = numpy.array([load_image(f, size) for f in files])
    return imgs, numpy.array(labels)      

images, labels = load_labelled_images("./data", size)
print("Loaded", len(images), "images in the following", len(numpy.unique(labels)), "classes:")
for label in numpy.unique(labels):
    print(label)
Loaded 80 images in the following 3 classes:
car
dog
house

📃 Sample the data¶

To get an impression of the data, here a sample from the loaded images is plotted so see if they we loaded correctly. The parameter sample_size can be increased if more images should be shown.

In [ ]:
sample_size = 24


plotimgs = copy.deepcopy(images)
numpy.random.shuffle(plotimgs)
rows = plotimgs[:sample_size]

_, subplots = plt.subplots(nrows = math.ceil(sample_size/8), ncols = 8, figsize=(18, int(sample_size/3)))
subplots = subplots.flatten()
for i, x in enumerate(rows):
    subplots[i].imshow(numpy.reshape(x, [size, size, 3]))
    subplots[i].set_xticks([])
    subplots[i].set_yticks([])
No description has been provided for this image

🛠️ Preprocessing¶

Given that this case uses images, there is no such thing as feature selection because one cannot select some pixels to be better indicators than other pixels beforehand. Therefore, there is little to do in terms of preprocessing other than splitting the dataset into a trainset and testset.

🪓 Splitting into train/test

A split of 70%/30% is chosen here in order to have a fairly large number of testing images.

In [ ]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(images, labels, test_size=.3, random_state=0)

🧬 Modelling¶

In this step the model will be fitted with the trainset only. In this case a Support Vector Machine for classification.

In [ ]:
from sklearn.svm import SVC
model = SVC(C=1.0)

model.fit(X_train, y_train)
score = model.score(X_test, y_test)
print("Accuracy:", score)

# from sklearn.svm import SVC

# rbf = SVC(kernel='rbf', C=1.0)
# sigmoid = SVC(kernel='sigmoid', C=1.0)
# linear = SVC(kernel='linear', C=1.0)
# poly = SVC(kernel='poly', C=1.0, degree=2)  

# poly.fit(X_train, y_train)
# linear.fit(X_train, y_train)
# rbf.fit(X_train, y_train)
# sigmoid.fit(X_train, y_train)

# linearScore = linear.score(X_test, y_test)
# polyScore = poly.score(X_test, y_test)
# rbfScore = rbf.score(X_test, y_test)
# sigmoidXcore = sigmoid.score(X_test, y_test)

# print("Accuracy for linear:", linearScore)
# print("Accuracy for polynomial:", polyScore)
# print("Accuracy for rbf:", rbfScore)
# print("Accuracy for sigmoid:", sigmoidXcore)

##Results for accuracy without specifying the kernel
# 0.8253968253968254 for 2.0
# 0,57 for 0.5
# 0.8253968253968254 for 1.0  
Accuracy: 0.875

🔬 Evaluation¶

Below a classification report is printed. This shows for every one of the classes how well the model performed.

In [ ]:
from sklearn.metrics import classification_report
predictions = model.predict(X_test)
report = classification_report(y_test, predictions)
print(report)
              precision    recall  f1-score   support

         car       0.75      0.75      0.75         4
         dog       0.89      0.89      0.89         9
       house       0.91      0.91      0.91        11

    accuracy                           0.88        24
   macro avg       0.85      0.85      0.85        24
weighted avg       0.88      0.88      0.88        24

It appears that Mewtwo is fairly hard to recognize, but the others all seem well. The code below will plot every pokemon in the testset, including the predicted label as well as whether this was correct or wrong.

In [ ]:
_, subplots = plt.subplots(nrows = math.ceil(len(X_test)/4), ncols = 4, figsize=(15, len(X_test)))
subplots = subplots.flatten()

for i, x in enumerate(X_test):
    subplots[i].imshow(numpy.reshape(x, [size, size, 3]))
    subplots[i].set_xticks([])
    subplots[i].set_yticks([])
    subplots[i].set_title(predictions[i] + (" (correct)" if predictions[i] == y_test[i] else " (wrong)"))
No description has been provided for this image

Even a relatively simple Support Vector Machine with just minutes of training time can do reasonably well at image recognition. Surely a deep learning CNN would perhaps do even better, but also at largely increased need for training resources and time. Probably, when the number of Pokemon increases and others with similar colours will be added this model's quality is likely to decrease quite rappidly, but then maybe also the quality of the images should be improved to help the machine. For example, the current images are of rather poor resolution and some even have significant background noise. Having cleaner, high quality, high resolution images may improve the general outcome.

Hyperparameter C¶

As we know the default value of c is 1. Support Vector Machine is trying to find a balance between maximizing the margin and minimazing the classification error. Having a smller value like 0.5, means that our machine is more tolerant of misclassification and is going to return a wider margin result. There was not a specified kernel, so by default it is rbf. The accuracy for 0,5 was 0.57, for 2.0 was the same as 1.0- 0.83. For sure 0,5 would not be the best choice in our case since it is relatively lower than 1.0 and 2.0.

Hypermeter kernel¶

The hypermeter kernel accepts linear, poly, rbf, sigmoid, precomputed, callable.

Choice of kernel¶

Those are the results from the difffernt kernel results Accuracy for linear: 0.8412698412698413 Accuracy for polynomial: 0.8253968253968254 Accuracy for rbf: 0.8253968253968254 Accuracy for sigmoid: 0.19047619047619047 Based on the results we can cocnlude that the linear kernel is the best model since it has the best accuracy, after that are the polynomial and rbf and lastly is the sigmoid. Rbf is usually the most secure kernel if you do not know what type your data is. The linear kernel is most suitable when our data is linear seperable or in other wordsa straigh line can be drawn betwenn the classes. The polynomial is usitable for data that has a non-linear decision boundary. It is good to mention that when we work with such kernel, it is very important to choose an appropriate degree since larger degrees tend to overfit. The most common degree is 2. And for the sigmoid kernel we have the lowest value, usually it is not the most used kernel in practice and it does not perform as well as the other kernels.

Moaaah Pokemon¶

As we firstly started with the 6 classes, the overall accuracy was 87%. There were some classes that their precision, recall and f1-score were relatively high for example Bulbasur, Charmander, Electrode. On the other hand Mewto has a very low recall. As we added 4 more classes, the results were even lower. We have more classes, which means that in that case, there is lower performance chance. The overall accuracy was 71%, the precision, recall and f1- score vary , but it is clear that the model has difficulty distinguishing some of the classes. Both scenarios show that some of the classes perform better than in the other, there could be some visual similarities, which can lead to confusion for the model. Overall, reducing the number is helping the model to achieve better results.

Car, dog, house¶

Overall, the accuracy for the model of our new dataset consisting of car, dog and house is 88%, which is relatively high number. There is a balanced f1 macro average score of 85%, weighted average is 88, so we can conclude that the performance was good with high precision, recall and f1 scores. It has accurate predictions. With around 25 images for each class, the model was able to distinguish the different classes.